On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection

نویسندگان

Kehao Wang

Lin Chen

Quan Liu

Khaldoun Al Agha

چکیده

We consider the channel access problem in a multi-channel opportunistic communication system with imperfect channel sensing, where the state of each channel evolves as a non independent and identically distributed Markov process. This problem can be cast into a restless multi-armed bandit (RMAB) problem that is intractable for its exponential computation complexity. A natural alternative is to consider the easily implementable myopic policy that maximizes the immediate reward but ignores the impact of the current strategy on the future reward. In particular, we analyze a family of generic and practically important functions, termed as g-regular functions characterized by three axioms, and establish a set of closed-form structural conditions for the optimality of myopic policy. Index Terms Restless multi-armed bandit (RMAB), myopic policy, opportunistic spectrum access (OSA), Imperfect Detection

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to any non-trivial factor. Thus the optimality is very difficult to obtain due to its high complexity. A natural method is to obtain the greedy policy considerin...

متن کامل

Structure and Optimality of Myopic Policy in Opportunistic Access with Noisy Observations

A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert-Elliot channels and channel state observations are subject to errors. A simple structure of the myopic policy is established under a certain condition on the false alarm probability of the channel state detector. It is show...

متن کامل

Indexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access

We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexabil...

متن کامل

Multi-armed Bandits with Constrained Arms and Hidden States

The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a threshold policy. Further, indexability of rested bandits is ...

متن کامل

Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit

We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. In this problem, at each time, a player chooses K out of N (N > K) arms to play. The state of each arm determines the reward when the arm is played and transits according to Markovian rules no matter the arm is engaged or passive. The Markovian dynamics of the arms are unknown to the player. The objective is to ma...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1205.5375 شماره

صفحات -

تاریخ انتشار 2012

On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection

نویسندگان

چکیده

منابع مشابه

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

Structure and Optimality of Myopic Policy in Opportunistic Access with Noisy Observations

Indexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access

Multi-armed Bandits with Constrained Arms and Hidden States

Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit

عنوان ژورنال:

اشتراک گذاری